16S-rDNA-V3-V4 repository: https://github.com/ycl6/16S-rDNA-V3-V4

picrust2: GitHub, Documentation, Paper

ALDEx2: GitHub, Documentation, Paper

Demo Dataset: PRJEB27564 from Gut Microbiota in Parkinson’s Disease: Temporal Stability and Relations to Disease Progression. EBioMedicine. 2019;44:691-707

License: GPL-3.0


Workflow (Continues from Part 3)

Note: picrust2 requires Python 3.5 or 3.6

Activate picrust2 environment

Note: Use conda deactivate to deactivate from a currently active conda environment if required

conda activate picrust2

Run picrust2

  • Requires fasta file expr.asv.fasta and biom file expr.biom generated in Part 2: phyloseq
  • The --output argument to specify the output folder for final files
  • The --processes N argument to specify the number of CPUs to run picrust2 in parallel
picrust2_pipeline.py --study_fasta outfiles/expr.asv.fasta --input outfiles/expr.biom \
    --output picrust2_out_stratified --processes 6 \
    --stratified --remove_intermediate --verbose

Completed PICRUSt2 pipeline in 2389.99 seconds.

Note: Use conda deactivate to deactivate the picrust2 environment if required

Locate picrust2 mapfiles

Use the locate command to locate the PATH that keeps the mapfiles

locate description_mapfiles

Example output:

/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/KEGG_modules_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/KEGG_pathways_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/cog_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/ec_level4_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/ko_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/metacyc_pathways_info.txt.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/pfam_info.tsv.gz
/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles/tigrfam_info.tsv.gz

Start R

cd /ngs/16S-Demo/PRJEB27564

R

Load package and set path

library("data.table")   # Also requires R.utils to read gz and bz2 files
library("phyloseq")
library("ALDEx2")
library("dplyr")

Set picrust2 output folder

Full path is not necessary if R is executed in the folder one-level above picrust2_out_stratified

picrust2 = "picrust2_out_stratified"
list.files(picrust2, recursive = TRUE)
##  [1] "EC_metagenome_out/pred_metagenome_contrib.tsv.gz"
##  [2] "EC_metagenome_out/pred_metagenome_unstrat.tsv.gz"
##  [3] "EC_metagenome_out/seqtab_norm.tsv.gz"            
##  [4] "EC_metagenome_out/weighted_nsti.tsv.gz"          
##  [5] "EC_predicted.tsv.gz"                             
##  [6] "KO_metagenome_out/pred_metagenome_contrib.tsv.gz"
##  [7] "KO_metagenome_out/pred_metagenome_unstrat.tsv.gz"
##  [8] "KO_metagenome_out/seqtab_norm.tsv.gz"            
##  [9] "KO_metagenome_out/weighted_nsti.tsv.gz"          
## [10] "KO_predicted.tsv.gz"                             
## [11] "marker_predicted_and_nsti.tsv.gz"                
## [12] "out.tre"                                         
## [13] "pathways_out/path_abun_contrib.tsv.gz"           
## [14] "pathways_out/path_abun_unstrat.tsv.gz"

Build picrust2 output file paths

p2_EC = paste0(picrust2, "/EC_metagenome_out/pred_metagenome_unstrat.tsv.gz")
p2_KO = paste0(picrust2, "/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz")
p2_PW = paste0(picrust2, "/pathways_out/path_abun_unstrat.tsv.gz")

Set picrust2 mapfile folder

Use the “description_mapfiles” PATH you located with the locate command above

mapfile = "/home/user/miniconda3/envs/picrust2/lib/python3.6/site-packages/picrust2/default_files/description_mapfiles"
list.files(mapfile, recursive = TRUE)
## [1] "cog_info.tsv.gz"              "ec_level4_info.tsv.gz"       
## [3] "KEGG_modules_info.tsv.gz"     "KEGG_pathways_info.tsv.gz"   
## [5] "ko_info.tsv.gz"               "metacyc_pathways_info.txt.gz"
## [7] "pfam_info.tsv.gz"             "tigrfam_info.tsv.gz"

Build picrust2 map file paths

mapfile_EC = paste0(mapfile, "/ec_level4_info.tsv.gz")
mapfile_KO = paste0(mapfile, "/ko_info.tsv.gz")
mapfile_PW = paste0(mapfile, "/metacyc_pathways_info.txt.gz")

Load saved workspace from Part 3

load("3_lefse_tutorial.RData")

Prepare input data

Load map files

mapEC = as.data.frame(fread(mapfile_EC, header = FALSE))
colnames(mapEC) = c("function","description")
mapKO = as.data.frame(fread(mapfile_KO, header = FALSE, sep = "\t"))
colnames(mapKO) = c("function","description")
mapPW = as.data.frame(fread(mapfile_PW, header = FALSE))
colnames(mapPW) = c("pathway","description")

Load picrust2 output files

p2EC = as.data.frame(fread(p2_EC))
rownames(p2EC) = p2EC$"function"
p2EC = as.matrix(p2EC[,-1])
p2EC = round(p2EC)

p2KO = as.data.frame(fread(p2_KO))
rownames(p2KO) = p2KO$"function"
p2KO = as.matrix(p2KO[,-1])
p2KO = round(p2KO)

p2PW = as.data.frame(fread(p2_PW))
rownames(p2PW) = p2PW$"pathway"
p2PW = as.matrix(p2PW[,-1])
p2PW = round(p2PW)

Subset results

  • Subset-1: Patient vs. control at baseline
  • Subset-2: Patient followup vs. patient baseline
# Subset-1
p2EC1 = p2EC[,sample_names(ps1a)]
p2KO1 = p2KO[,sample_names(ps1a)]
p2PW1 = p2PW[,sample_names(ps1a)]

# Subset-2
p2EC2 = p2EC[,sample_names(ps1b)]
p2KO2 = p2KO[,sample_names(ps1b)]
p2PW2 = p2PW[,sample_names(ps1b)]

Perform statistical analysis

We use the ANOVA-like differential expression (ALDEx2) compositional data analysis (CoDA) method to perform differential abundance testing between 2 groups/conditions

  • If the test is ‘glm’, then effect should be FALSE. The ‘glm’ option evaluates the data as a one-way ANOVA using the glm and Kruskal-Wallace test
  • If the test is ‘t’, then effect should be set to TRUE. The ‘t’ option evaluates the data as a two-factor experiment using both the Welch’s t and the Wilcoxon rank tests
  • All tests include a Benjamini-Hochberg correction of the raw P values

The mc.samples argument defines the number of Monte Carlo samples to use to estimate the underlying distributions. The default is 128

On Subset-1: Patient vs. control at baseline

set.seed(12345)
system.time({
        aldex2_EC1 = aldex(p2EC1, sample_data(ps1a)$Group, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})
## aldex.clr: generating Monte-Carlo instances and clr values
## operating in serial mode
## removed rows with sums equal to zero
## computing iqlr centering
## data format is OK
## dirichlet samples complete
## transformation complete
## aldex.ttest: doing t-test
## running tests for each MC instance:
## |------------(25%)----------(50%)----------(75%)----------|
## aldex.effect: calculating effect sizes
## operating in serial mode
## sanity check complete
## rab.all  complete
## rab.win  complete
## rab of samples complete
## within sample difference calculated
## between group difference calculated
## group summaries calculated
## effect size calculated
## summarizing output
##    user  system elapsed 
## 136.546  10.196 146.752
set.seed(12345)
system.time({
        aldex2_KO1 = aldex(p2KO1, sample_data(ps1a)$Group, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})
## aldex.clr: generating Monte-Carlo instances and clr values
## operating in serial mode
## removed rows with sums equal to zero
## computing iqlr centering
## data format is OK
## dirichlet samples complete
## transformation complete
## aldex.ttest: doing t-test
## running tests for each MC instance:
## |------------(25%)----------(50%)----------(75%)----------|
## aldex.effect: calculating effect sizes
## operating in serial mode
## sanity check complete
## rab.all  complete
## rab.win  complete
## rab of samples complete
## within sample difference calculated
## between group difference calculated
## group summaries calculated
## effect size calculated
## summarizing output
##    user  system elapsed 
## 421.366  33.110 454.501
set.seed(12345)
system.time({
        aldex2_PW1 = aldex(p2PW1, sample_data(ps1a)$Group, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})
## aldex.clr: generating Monte-Carlo instances and clr values
## operating in serial mode
## removed rows with sums equal to zero
## computing iqlr centering
## data format is OK
## dirichlet samples complete
## transformation complete
## aldex.ttest: doing t-test
## running tests for each MC instance:
## |------------(25%)----------(50%)----------(75%)----------|
## aldex.effect: calculating effect sizes
## operating in serial mode
## sanity check complete
## rab.all  complete
## rab.win  complete
## rab of samples complete
## within sample difference calculated
## between group difference calculated
## group summaries calculated
## effect size calculated
## summarizing output
##    user  system elapsed 
##  28.926   0.359  29.287

On Subset-2: Patient followup vs. patient baseline

set.seed(12345)
system.time({
        aldex2_EC2 = aldex(p2EC2, sample_data(ps1b)$Time, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})
## aldex.clr: generating Monte-Carlo instances and clr values
## operating in serial mode
## removed rows with sums equal to zero
## computing iqlr centering
## data format is OK
## dirichlet samples complete
## transformation complete
## aldex.ttest: doing t-test
## running tests for each MC instance:
## |------------(25%)----------(50%)----------(75%)----------|
## aldex.effect: calculating effect sizes
## operating in serial mode
## sanity check complete
## rab.all  complete
## rab.win  complete
## rab of samples complete
## within sample difference calculated
## between group difference calculated
## group summaries calculated
## effect size calculated
## summarizing output
##    user  system elapsed 
## 133.190   7.324 140.519
set.seed(12345)
system.time({
        aldex2_KO2 = aldex(p2KO2, sample_data(ps1b)$Time, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})
## aldex.clr: generating Monte-Carlo instances and clr values
## operating in serial mode
## removed rows with sums equal to zero
## computing iqlr centering
## data format is OK
## dirichlet samples complete
## transformation complete
## aldex.ttest: doing t-test
## running tests for each MC instance:
## |------------(25%)----------(50%)----------(75%)----------|
## aldex.effect: calculating effect sizes
## operating in serial mode
## sanity check complete
## rab.all  complete
## rab.win  complete
## rab of samples complete
## within sample difference calculated
## between group difference calculated
## group summaries calculated
## effect size calculated
## summarizing output
##    user  system elapsed 
## 425.481  32.935 458.434
set.seed(12345)
system.time({
        aldex2_PW2 = aldex(p2PW2, sample_data(ps1b)$Time, mc.samples = 500, test = "t", 
               effect = TRUE, denom = "iqlr", verbose = TRUE)
})
## aldex.clr: generating Monte-Carlo instances and clr values
## operating in serial mode
## removed rows with sums equal to zero
## computing iqlr centering
## data format is OK
## dirichlet samples complete
## transformation complete
## aldex.ttest: doing t-test
## running tests for each MC instance:
## |------------(25%)----------(50%)----------(75%)----------|
## aldex.effect: calculating effect sizes
## operating in serial mode
## sanity check complete
## rab.all  complete
## rab.win  complete
## rab of samples complete
## within sample difference calculated
## between group difference calculated
## group summaries calculated
## effect size calculated
## summarizing output
##    user  system elapsed 
##  28.959   0.044  29.003

See a ALDEx2 output

head(aldex2_EC1)
##                  rab.all rab.win.control rab.win.patient     diff.btw  diff.win
## EC:1.1.1.1     6.9670134       6.8399074        7.026798  0.137126595 0.6539090
## EC:1.1.1.100   7.8855661       7.8457266        7.910683  0.048284158 0.6339011
## EC:1.1.1.103   0.9638683       1.0588967        0.885164 -0.371719037 1.0002322
## EC:1.1.1.105 -11.5017351     -11.4963559      -11.510944 -0.003672878 4.6253005
## EC:1.1.1.108  -0.9576520      -0.9007452       -1.228159 -0.068446720 2.1561593
## EC:1.1.1.11  -11.2805610     -11.0891518      -11.458349 -0.526569427 5.2556315
##                     effect   overlap      we.ep    we.eBH      wi.ep    wi.eBH
## EC:1.1.1.1    0.1883663606 0.4090000 0.46941039 0.8572397 0.21008573 0.7725745
## EC:1.1.1.100  0.0692058771 0.4575085 0.71847324 0.9360795 0.62759621 0.8728473
## EC:1.1.1.103 -0.3387379155 0.3657269 0.05091847 0.8452762 0.06889368 0.7725086
## EC:1.1.1.105 -0.0007604785 0.4995001 0.52599701 0.9001639 0.54271902 0.8637600
## EC:1.1.1.108 -0.0281155320 0.4871026 0.92164472 0.9880340 0.78177593 0.9262043
## EC:1.1.1.11  -0.0948147749 0.4559088 0.45638179 0.8891290 0.48833184 0.8505679

Check estimated effect size

ALDEx2 authors suggest that an effect size of 1 or greater can be used as significance cutoff

On Subset-1: Patient vs. control at baseline

None of the metabolic predictions show differential abundances between Parkinson’s patients and control subjects at baseline

quantile(aldex2_EC1$effect, seq(0, 1, 0.1))
##           0%          10%          20%          30%          40%          50% 
## -0.552171281 -0.143354294 -0.095777719 -0.047474164  0.002335428  0.049843060 
##          60%          70%          80%          90%         100% 
##  0.085790641  0.132974573  0.166611773  0.220836891  0.608252432
quantile(aldex2_KO1$effect, seq(0, 1, 0.1))
##          0%         10%         20%         30%         40%         50%         60% 
## -0.40817340 -0.13524876 -0.09259347 -0.07091410 -0.01661805  0.03315215  0.10306123 
##         70%         80%         90%        100% 
##  0.15693495  0.20609028  0.24474998  0.59072805
quantile(aldex2_PW1$effect, seq(0, 1, 0.1))
##           0%          10%          20%          30%          40%          50% 
## -0.357440944 -0.128067937 -0.093498475 -0.060033789  0.006498075  0.063239755 
##          60%          70%          80%          90%         100% 
##  0.107109408  0.149953638  0.181277415  0.235775650  0.519397483

On Subset-2: Patient followup vs. patient baseline

None of the metabolic predictions show differential abundances between Parkinson’s patients at baseline and at followup

quantile(aldex2_EC2$effect, seq(0, 1, 0.1))
##          0%         10%         20%         30%         40%         50%         60% 
## -0.38148536 -0.11356733 -0.08393901 -0.06728688 -0.04176192 -0.02038713  0.01096127 
##         70%         80%         90%        100% 
##  0.04289693  0.09061738  0.13007363  0.33177382
quantile(aldex2_KO2$effect, seq(0, 1, 0.1))
##             0%            10%            20%            30%            40%            50% 
## -0.48944686037 -0.15184720082 -0.12977261324 -0.10505150541 -0.06794754617 -0.02990147648 
##            60%            70%            80%            90%           100% 
## -0.00005788612  0.04508318326  0.09080765412  0.11603033035  0.38786073051
quantile(aldex2_PW2$effect, seq(0, 1, 0.1))
##           0%          10%          20%          30%          40%          50% 
## -0.252167477 -0.105002163 -0.053820755 -0.029554967 -0.015065742  0.001523376 
##          60%          70%          80%          90%         100% 
##  0.030659236  0.066819546  0.105738759  0.136623082  0.294871035

Plotting of outputs

Create MW and MA plots

Use aldex.plot function to create MW (fold-change to variance/effect) and MA (Bland-Altman) plots

The plot below shows the relationship between effect size and P values and BH-adjusted P values in the test dataset.

On Subset-1: Patient vs. control at baseline

png("images/ALDEx2_picrust2_MW_MA_1.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow=c(3,2))
aldex.plot(aldex2_EC1, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(EC) MW Plot")

aldex.plot(aldex2_EC1, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Log-ratio abundance", ylab = "Difference")
title(main = "(EC) MA Plot")

aldex.plot(aldex2_KO1, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(KO) MW Plot")

aldex.plot(aldex2_KO1, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(KO) MA Plot")

aldex.plot(aldex2_PW1, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(PW) MW Plot")

aldex.plot(aldex2_PW1, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(PW) MA Plot")
invisible(dev.off())
"images/ALDEx2_picrust2_MW_MA_1.png"

“images/ALDEx2_picrust2_MW_MA_1.png”

On Subset-2: Patient followup vs. patient baseline

png("images/ALDEx2_picrust2_MW_MA_2.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow=c(3,2))
aldex.plot(aldex2_EC2, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(EC) MW Plot")

aldex.plot(aldex2_EC2, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(EC) MA Plot")

aldex.plot(aldex2_KO2, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(KO) MW Plot")

aldex.plot(aldex2_KO2, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(KO) MA Plot")

aldex.plot(aldex2_PW2, type = "MW", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Dispersion", ylab = "Difference")
title(main = "(PW) MW Plot")

aldex.plot(aldex2_PW2, type = "MA", test = "wilcox", all.cex = 0.4, rare.cex = 0.4, 
       called.cex = 0.6, cutoff = 0.05, xlab = "Relative abundance", ylab = "Difference")
title(main = "(PW) MA Plot")
invisible(dev.off())
"images/ALDEx2_picrust2_MW_MA_2.png"

“images/ALDEx2_picrust2_MW_MA_2.png”

Relationship between effect, difference, and P values

On Subset-1: Patient vs. control at baseline

png("images/ALDEx2_picrust2_P_adjP_1.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow=c(3,2))
plot(aldex2_EC1$effect, aldex2_EC1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(EC) Effect size plot")
points(aldex2_EC1$effect, aldex2_EC1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_EC1$diff.btw, aldex2_EC1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(EC) Volcano plot")
points(aldex2_EC1$diff.btw, aldex2_EC1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_KO1$effect, aldex2_KO1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(KO) Effect size plot")
points(aldex2_KO1$effect, aldex2_KO1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_KO1$diff.btw, aldex2_KO1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(KO) Volcano plot")
points(aldex2_KO1$diff.btw, aldex2_KO1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_PW1$effect, aldex2_PW1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(PW) Effect size plot")
points(aldex2_PW1$effect, aldex2_PW1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_PW1$diff.btw, aldex2_PW1$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(PW) Volcano plot")
points(aldex2_PW1$diff.btw, aldex2_PW1$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
invisible(dev.off())
"images/ALDEx2_picrust2_P_adjP_1.png"

“images/ALDEx2_picrust2_P_adjP_1.png”

On Subset-2: Patient followup vs. patient baseline

png("images/ALDEx2_picrust2_P_adjP_2.png", width = 6, height = 8, units = "in", res = 300)
par(mfrow=c(3,2))
plot(aldex2_EC2$effect, aldex2_EC2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(EC) Effect size plot")
points(aldex2_EC2$effect, aldex2_EC2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_EC2$diff.btw, aldex2_EC2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(EC) Volcano plot")
points(aldex2_EC2$diff.btw, aldex2_EC2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_KO2$effect, aldex2_KO2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(KO) Effect size plot")
points(aldex2_KO2$effect, aldex2_KO2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_KO2$diff.btw, aldex2_KO2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(KO) Volcano plot")
points(aldex2_KO2$diff.btw, aldex2_KO2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")

plot(aldex2_PW2$effect, aldex2_PW2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Effect size", ylab = "P value", main = "(PW) Effect size plot")
points(aldex2_PW2$effect, aldex2_PW2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
legend("bottom", legend = c("P value", "BH-adjusted"), pch = 19, col = c("blue", "red"))

plot(aldex2_PW2$diff.btw, aldex2_PW2$we.ep, log = "y", cex = 0.4, col = "blue", pch = 19, 
     xlab = "Difference", ylab = "P value", main = "(PW) Volcano plot")
points(aldex2_PW2$diff.btw, aldex2_PW2$we.eBH, cex = 0.5, col = "red", pch = 19)
abline(h = 0.05, lty = 2, col = "grey")
invisible(dev.off())
"images/ALDEx2_picrust2_P_adjP_2.png"

“images/ALDEx2_picrust2_P_adjP_2.png”

Create output files

Merge with map file data

# Subset-1
df_EC1 = aldex2_EC1 %>% tibble::rownames_to_column(var = "EC") %>% 
    inner_join(mapEC, by = c("EC" = "function")) %>% arrange(EC)

df_KO1 = aldex2_KO1 %>% tibble::rownames_to_column(var = "KO") %>% 
    inner_join(mapKO, by = c("KO" = "function")) %>% arrange(KO)

df_PW1 = aldex2_PW1 %>% tibble::rownames_to_column(var = "Pathway") %>% 
    inner_join(mapPW, by = c("Pathway" = "pathway")) %>% arrange(Pathway)

# Subset-2
df_EC2 = aldex2_EC2 %>% tibble::rownames_to_column(var = "EC") %>%
        inner_join(mapEC, by = c("EC" = "function")) %>% arrange(EC)

df_KO2 = aldex2_KO2 %>% tibble::rownames_to_column(var = "KO") %>%
        inner_join(mapKO, by = c("KO" = "function")) %>% arrange(KO)

df_PW2 = aldex2_PW2 %>% tibble::rownames_to_column(var = "Pathway") %>%
        inner_join(mapPW, by = c("Pathway" = "pathway")) %>% arrange(Pathway)

Output to file

# Subset-1
write.table(df_EC1, file = "outfiles/ALDEx2_picrust2_EC_results_1.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_KO1, file = "outfiles/ALDEx2_picrust2_KO_results_1.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_PW1, file = "outfiles/ALDEx2_picrust2_Pathway_results_1.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)

# Subset-2
write.table(df_EC2, file = "outfiles/ALDEx2_picrust2_EC_results_2.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_KO2, file = "outfiles/ALDEx2_picrust2_KO_results_2.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)
write.table(df_PW2, file = "outfiles/ALDEx2_picrust2_Pathway_results_2.txt", sep = "\t", quote = F, 
        row.names = F, col.names = T)

Save current workspace

save.image(file = "4_picrust2_tutorial.RData")

Session information

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /home/ihsuan/miniconda3/envs/p27/lib/libopenblasp-r0.3.7.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
##  [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_1.0.0       ALDEx2_1.18.0     phyloseq_1.30.0   data.table_1.12.8
## [5] knitr_1.29       
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-148                bitops_1.0-6                matrixStats_0.56.0         
##   [4] bit64_0.9-7                 RColorBrewer_1.1-2          GenomeInfoDb_1.22.1        
##   [7] tools_3.6.2                 backports_1.1.8             R6_2.4.1                   
##  [10] vegan_2.5-6                 rpart_4.1-15                Hmisc_4.4-0                
##  [13] DBI_1.1.0                   BiocGenerics_0.32.0         mgcv_1.8-31                
##  [16] colorspace_1.4-1            permute_0.9-5               ade4_1.7-15                
##  [19] nnet_7.3-14                 tidyselect_1.1.0            gridExtra_2.3              
##  [22] DESeq2_1.26.0               bit_1.1-15.2                compiler_3.6.2             
##  [25] Biobase_2.46.0              htmlTable_2.0.1             DelayedArray_0.12.3        
##  [28] scales_1.1.1                checkmate_2.0.0             genefilter_1.68.0          
##  [31] Rsamtools_2.2.3             stringr_1.4.0               digest_0.6.25              
##  [34] foreign_0.8-73              R.utils_2.9.2               dada2_1.14.1               
##  [37] rmarkdown_2.3               XVector_0.26.0              base64enc_0.1-3            
##  [40] jpeg_0.1-8.1                pkgconfig_2.0.3             htmltools_0.5.0            
##  [43] highr_0.8                   htmlwidgets_1.5.1           rlang_0.4.6                
##  [46] rstudioapi_0.11             RSQLite_2.2.0               generics_0.0.2             
##  [49] hwriter_1.3.2               jsonlite_1.7.0              BiocParallel_1.20.1        
##  [52] R.oo_1.23.0                 acepack_1.4.1               RCurl_1.98-1.2             
##  [55] magrittr_1.5                GenomeInfoDbData_1.2.2      Formula_1.2-3              
##  [58] biomformat_1.14.0           Matrix_1.2-18               Rcpp_1.0.5                 
##  [61] munsell_0.5.0               S4Vectors_0.24.4            Rhdf5lib_1.8.0             
##  [64] ape_5.4                     R.methodsS3_1.8.0           lifecycle_0.2.0            
##  [67] stringi_1.4.6               yaml_2.2.1                  MASS_7.3-51.6              
##  [70] SummarizedExperiment_1.16.1 zlibbioc_1.32.0             rhdf5_2.30.1               
##  [73] plyr_1.8.6                  blob_1.2.1                  grid_3.6.2                 
##  [76] parallel_3.6.2              crayon_1.3.4                lattice_0.20-41            
##  [79] Biostrings_2.54.0           splines_3.6.2               multtest_2.42.0            
##  [82] annotate_1.64.0             locfit_1.5-9.4              pillar_1.4.5               
##  [85] igraph_1.2.5                GenomicRanges_1.38.0        geneplotter_1.64.0         
##  [88] reshape2_1.4.4              codetools_0.2-16            stats4_3.6.2               
##  [91] XML_3.98-1.20               glue_1.4.1                  ShortRead_1.44.3           
##  [94] evaluate_0.14               latticeExtra_0.6-29         RcppParallel_5.0.2         
##  [97] png_0.1-7                   vctrs_0.3.1                 foreach_1.5.0              
## [100] gtable_0.3.0                purrr_0.3.4                 ggplot2_3.3.2              
## [103] xfun_0.15                   xtable_1.8-4                survival_3.2-3             
## [106] tibble_3.0.2                iterators_1.0.12            GenomicAlignments_1.22.1   
## [109] memoise_1.1.0               AnnotationDbi_1.48.0        IRanges_2.20.2             
## [112] cluster_2.1.0               ellipsis_0.3.1